A System for Automatic Abbreviation Expansion
نویسندگان
چکیده
A system for automatic abbreviation expansion was developed and tested for use with an AAC device. The system blends several technologies in a process that automatically expands user generated abbreviations while additionally providing spell-checking. Using a series of heuristic rules and a statistical language model, the system combines a series of rule scores and probabilities to rank valid word candidates in a list for user selection. Stressing flexibility, the system requires that users only follow two intuitive rules for the construction of abbreviations. Testing revealed that the system was able to correctly expand 91.9% of a set of collected abbreviations (using a 5 member list), reducing keystrokes by 14.8% when used in an experimental communication task. BACKGROUND Users of AAC devices have a common desire to increase communication rate over that possible with current systems. A simple and effective method for accelerating communication for AAC users is abbreviation expansion (1). Early systems utilized a lookup procedure, where codes were cross referenced with words or phrases stored in an abbreviation table. Although effective, this method requires the ongoing effort of memorizing new codes and maintaining an abbreviation table database. Demasco (2) improved upon this by developing a rule-based system that allowed natural abbreviations and unrestricted vocabulary access, eliminating the need to maintain a lookup table. This system is more flexible, but at the cost of reduced keystroke savings and some loss of generality. Additional functionality to separate misspellings or typing errors from abbreviation attempts is also important for an effective system. Mistyped or new words can mistakenly be interpreted as abbreviations and expanded into words unintended by the user. No currently available system provides a truly comprehensive approach to abbreviation expansion. RESEARCH QUESTION The main drawbacks of table lookup systems are the cognitive loads required to memorize and later recall the abbreviations stored in the table. Rule-based systems are inflexible and often require that the user apply unintuitive abbreviation rules when entering abbreviations. A truly comprehensive system must be flexible enough to handle a wide variety of inputs, process misspellings and typographical errors, and easily permit the addition of new words to the system lexicon — all while allowing the user to generate abbreviations naturally (2). Is there an intuitive abbreviation expansion system that can provide all of these capabilities? STATEMENT OF THE PROBLEM The challenge is to design a system that can accommodate the wide variety of abbreviation strategies used by different individuals. The system must provide users with the ability to generate abbreviations naturally within an interface that is both functional and effective. Keystroke savings is the primary goal, but the system must also manage other functional operations such as spellchecking, editing, and the addition of words to the system lexicon. A SYSTEM FOR ABBREVIATION EXPANSION RATIONALE This new design addresses the weaknesses of the current expansion methods by combining several new and existing technologies into a flexible and intuitive abbreviation expansion system that can provide significant keystroke savings. DESIGN The system requires that the user follow two intuitive rules for creating their own abbreviations: i) the abbreviation must start with the same letter as the intended word, and ii) the letters of the abbreviation must appear in the same order as they appear in the intended word (2). Both rules impose a minimal cognitive load, encouraging users to generate abbreviations naturally, saving keystrokes whenever possible. All input is first referenced against a standard lookup table of abbreviations. The system allows the specification of strict abbreviations in the lookup table. Similar to normal table entries, they force the system to exit the processing stream immediately. If necessary, the system continues processing after the table lookup and generates candidates which are ranked and ordered for presentation to the user. Candidates are generated for non-strict table entries and low probability words to correctly process misspellings that may result in a valid word (4). These unintended errors go undetected by current systems. Candidates are derived from the system lexicon using the two abbreviation rules and a spelling checker (4). The ranking algorithm combines a heuristic rule score with probabilities obtained from the system’s statistical language model to order candidates by likelihood. If none of the presented candidates is the intended word, the user can invoke a specific action, such as editing the input or adding the new word to the system dictionary. The system flow chart is shown in Figure 1. DEVELOPMENT The system was developed in software and integrated into an in-house AAC package for testing. Experimentally collected abbreviations were used to train the system and optimize the heuristic rule scoring algorithm and parameters of the ranking algorithm. EVALUATION The integrity of the design was evaluated with off-line testing and experiments involving a communication task. The offline testing was achieved by analyzing a database of abbreviation-word pairs (independent of the set used for training). Abbreviations were used as inputs to the system and String In Look-up Table? User Input Is Abbreviation Strict? Decode Abbreviation Done Generate Additional Candidates Rank Candidates YES YES Present Candidates & Options YES Done String In Lexicon? Low Probability Word? Generate Candidates NO YES NO
منابع مشابه
A Probabilistic Flexible Abbreviation Expansion System for Users With Motor Disabilities
In this paper we describe the initial design, training and evaluation of a prototype system enabling the automatic and flexible expansion of an abbreviated, typed text input, into a reconstructed sentence. The system’s target user group is cognitively unimpaired users with motor disabilities, for whom typing can be slow and tiring. It is intended that, by reducing the number of keystrokes requi...
متن کاملAutomatic expansion of abbreviations by using context and character information
Unknown words such as proper nouns, abbreviations, and acronyms are a major obstacle in text processing. Abbreviations, in particular, are difficult to read/process because they are often domain-specific. In this paper, we propose a method for automatic expansion of abbreviations by using context and character information. In previous studies dictionaries were used to search for abbreviation ex...
متن کاملIMPROVING AUTOMATIC ABBREVIATION EXPANSION WITHIN SOURCE CODE TO AID IN PROGRAM SEARCH TOOLS by
Software maintenance is an important part of the software lifecycle. Understanding large software systems that are unfamiliar can be difficult for maintenance programmers. Intelligent and robust search tools are one method for facilitating program understanding and comprehension. One of the major problems associated with improving search tools is the use of abbreviations within software. The fo...
متن کاملVocabulary expansion through automatic abbreviation generation for Chinese voice search
Long named entities are often abbreviated in oral Chinese language, and this usually leads to out-of-vocabulary(OOV) problems in speech recognition applications. The generation of Chinese abbreviations is much more complex than English abbreviations, most of which are acronyms and truncations. In this paper, we propose a new method for automatically generating abbreviations for Chinese named en...
متن کاملHippocratic Abbreviation Expansion
Incorrect normalization of text can be particularly damaging for applications like text-to-speech synthesis (TTS) or typing auto-correction, where the resulting normalization is directly presented to the user, versus feeding downstream applications. In this paper, we focus on abbreviation expansion for TTS, which requires a “do no harm”, high precision approach yielding few expansion errors at ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999